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CN| ■ Abstract. Forest automata (FA) have recently been proposed as a tool for shape 

5^ I analysis of complex heap structures. FA encode sets of tree decompositions of 

^Lf heap graphs in the form of tuples of tree automata. In order to allow for represent- 

ing complex heap graphs, the notion of FA allowed one to provide user-defined 
FA (called boxes) that encode repetitive graph patterns of shape graphs to be used 
as alphabet symbols of other, higher-level FA. In this paper, we propose a novel 
technique of automatically learning the FA to be used as boxes that avoids the 
need of providing them manually. Further, we propose a significant improvement 

^^ ■ of the automata abstraction used in the analysis. The result is an efficient, fully- 

automated analysis that can handle even as complex data structures as skip lists, 
with the performance comparable to state-of-the-art fully-automated tools based 



Y^ ■ on separation logic, which, however, specialise in dealing with linked lists only. 



1 Introduction 

>: 

^«0 . Dealing with programs that use complex dynamic linked data structures belongs to 

the most challenging tasks in formal program analysis. The reason is a necessity of 
coping with infinite sets of reachable heap configurations that have a form of complex 
graphs. Representing and manipulating such sets in a sufficiently general, efficient, and 
automated way is a notoriously difficult problem. 

^^ I In 1 6 1, a notion of forest automata (FA) has been proposed for representing sets of 

reachable configurations of programs with complex dynamic linked data structures. FA 
have a form of tuples of tree automata (TA) that encode sets of heap graphs decom- 
posed into tuples of tree components whose leaves may refer back to the roots of the 

p\^ ' components. In order to allow for dealing with complex heap graphs, FA may be hierar- 

chically nested by using them as alphabet symbols of other, higher-level FA. Alongside 
the notion of FA, a shape analysis applying FA in the framework of abstract regular tree 
model checking (ARTMC) ||2| has been proposed in (|6l and implemented in the Forester 
tool. ARTMC accelerates the computation of sets of reachable program configurations 
represented by FA by abstracting their component TA, which is done by collapsing 
some of their states. The analysis was experimentally shown to be capable of proving 
memory safety of quite rich classes of heap structures as well as to be quite efficient. 
However, it relied on the user to provide the needed nested FA — called boxes — to be 
used as alphabet symbols of the top-level FA. 

In this paper, we propose a new shape analysis based on FA that avoids the need of 
manually providing the appropriate boxes. For that purpose, we propose a technique of 
automatically learning the FA to be used as boxes. The basic principle of the learning 
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stems from the reason for which boxes were originally 
introduced into FA. In particular, FA must have a sepa- 
rate component TA for each node (called a join) of the w 

represented graphs that has multiple incoming edges. ■ ■ ■ Q~ ; ^O t i *0' ' 

If the numberof joins is unbounded(as, e.g., in doubly ^^' ) ^i 

linked lists, abbreviated as DLLs below), unboundedly 
many component TA are needed in flat FA. However, 
when some of the edges are hidden in a box (as, e.g., 
the prev and next links of DLLs in Fig.[B and replaced ^ig- ^ ■ <^^) ^ ^^LL, (b) a hierar- 
by a single box-labelled edge, a finite number of com- '^hical encoding of a DLL. 
ponent TA may suffice. Hence, the basic idea of our learning is to identify subgraphs of 
the FA-represented graphs that contain at least one join, and when they are enclosed — 
or, as we say later on, folded — into a box, the in-degree of the join decreases. 

There are, of course, many ways to select the above mentioned subgraphs to be 
used as boxes. To choose among them, we propose several criteria that we found useful 
in a number of experiments. Most importantly, the boxes must be reusable in order to 
allow eliminating as many joins as possible. The general strategy here is to choose boxes 
that are simple and small since these are more likely to correspond to graph patterns that 
appear repeatedly in typical data structures. For instance, in the akeady mentioned case 
of DLLs, it is enough to use a box enclosing a single pair of next/prev links. On the 
other hand, as also discussed below, too simple boxes are sometimes not useful either. 

Further, we propose a way how box learning can be efficiently integrated into the 
main analysis loop. In particular, we do not use the perhaps obvious approach of incre- 
mentally building a database of boxes whose instances would be sought in the generated 
FA. We found this approach inefficient due to the costly operation of finding instances 
of different boxes in FA-represented graphs. Instead, we always try to identify which 
subgraphs of the graphs represented by a given FA could be folded into a box, followed 
by looking into the so-far built database of boxes whether such a box has already been 
introduced or not. Moreover, this approach has the advantage that it allows one to use 
simple language inclusion checks for approximate box folding, replacing a set of sub- 
graphs that appear in the graphs represented by a given FA by a larger set, which some- 
times greatly accelerates the computation. Finally, to further improve the efficiency, we 
interleave the process of box learning with the automata abstraction into a single itera- 
tive process. In addition, we propose an FA-specific improvement of the basic automata 
abstraction which accelerates the abstraction of an FA using components of other FA. 
Intuitively, it lets the abstraction synthesize an invariant faster by allowing it to combine 
information coming from different branches of the symbolic computation. 

We have prototyped the proposed techniques in Forester and evaluated it on a num- 
ber of challenging case studies. The results show that the obtained approach is both 
quite general as well as efficient. We were, e.g., able to fully-automatically analyse pro- 
grams with 2-level and 3-level skip lists, which, according to the best of our knowledge, 
no other fully-automated analyser can handle. On the other hand, our implementation 
achieves performance comparable and sometimes even better than that of Predator (|4l 
(a winner of the heap manipulation division of S V-COMP' 1 3) on list manipulating pro- 
grams despite being able to handle much more general classes of heap graphs. 



Related work. As discussed already above, we propose a new shape analysis based 
upon the notion of forest automata introduced in ||6l. The new analysis is extended 
by a mechanism for automatically learning the needed nested FA, which is carefully 
integrated into the main analysis loop in order to maximize its efficiency. Moreover, 
we formalize the abstraction used in ||6l, which was not done in Q, and subsequently 
significantly refine it in order to improve both its generality as well as efficiency. 

From the point of view of efficiency and degree of automation, the main alterna- 
tive to our approach is the fully-automated use of separation logic with inductive list 
predicates as implemented in Space Invader [TT| or SLAyer |T|. These approaches are, 
however, much less general than our approach since they are restricted to programs over 
certain classes of linked Usts (and cannot handle even structures such as linked lists with 
data pointers pointing either inside the list nodes or optionally outside of them, which 
we can easily handle as discussed later on). A similar comparison applies to the Preda- 
tor tool inspired by separation logic but using purely graph-based algorithms Q • The 
work ||9| on overlaid data structures mentions an extension of Space Invader to trees, 
but this extension is of a limited generality and requires some manual help. 

In 13], an approach for synthesising inductive predicates in separation logic is pro- 
posed. This approach is shown to handle even tree-like structures with additional point- 
ers. One of these structures, namely, the so-called mcf trees implementing trees whose 
nodes have an arbitrary number of successors linked in a DLL, is even more general 
than what can in principle be described by hierarchically nested FA (to describe mcf 
trees, recursively nested FA or FA based on hedge automata would be needed). On the 
other hand, the approach of seems quite dependent on exploiting the fact that the 
encountered data structures are built in a "nice" way conforming to the structure of the 
predicate to be learnt (meaning, e.g., that lists are built by adding elements at the end 
only), which is close to providing an inductive definition of the data structure. 

The work |10| proposes an approach which uses separation logic for generating 
numerical abstractions of heap manipulating programs allowing for checking both their 
safety as well as termination. The described experiments include even verification of 
programs with 2-level skip lists. However, the work still expects the user to manually 
provide an inductive definition of skip lists in advance. Likewise, the work [3 J based on 
the so-called separating shape graphs reports on verification of programs with 2-level 
skip lists, but it also requires the user to come up with summary edges to be used for 
summarizing skip list segments, hence basically with an inductive definition of skip 
lists. Compared to 1 10 3 1, we did not have to provide any manual aid whatsoever to our 
technique when dealing with 2-level as well as 3-level skip lists in our experiments. 

A concept of inferring graph grammar rules for the heap abstraction proposed in fS] 
has recently appeared in [11]. However, the proposed technique can so far only handle 
much less general structures than in our case. 



2 Forest Automata 

Given a word a = oi . ..a„,n > 1, we write a, to denote its i-th symbol a,. Given a total 
map f : A ^ B, we use dom{f) to denote its domain A and img{f) to denote its image. 



Graphs. A ranked alphabet is a finite set of symbols E associated with a mapping # : 
E ^ No that assigns ranks to symbols. A (directed, ordered, labelled) graph over E 
is a total map g : V — >■ E x V* which assigns to every node v e V (1) a label from E, 
denoted as tg{v), and (2) a sequence of successors from V*, denoted as Sg{v), such 
that Mg ( v) = 1 5g ( v) I . We drop the subscript g if no confusion may arise. Nodes v with 
S{v) = e are called leaves. For any v G V such that ^(v) — (a, vi • • • v„), we call the pair 

V I— >^ (a, vi • • • v„) an edge of g. The in-degree of a node in V is the overall number of its 
occurrences in g{v) across all v G V. The nodes of a graph g with an in-degree larger 
than one are called jo/ni of ^. 

A path from v to v' in g is a sequence /? = vq, /i , vi , . . . , /«, v„ where vq = v, v„ = v', 
and for each j : I < j < n, Vj is the /^-th successor of Vj-i. The length of p is defined 
as length{p) = n. The co,?? of /? is the sequence ii,...,i„. We say that p is cheaper than 
another path p' iff the cost of p is lexicographically smaller than that of p'. A node u is 
reachable from a node v iff there is a path from v to m or m = v. A graph g is accessible 
from a node v iff all its nodes are reachable from v. The node v is then called the root 
of g. A tree is a graph f which is either empty, or it has exactly one root and each of its 
nodes is the /-th successor of at most one node v for some / G N. 

Forests. Let E n N = 0. A E-labelled forest is a sequence of trees f i • • • f„ over (E U 
{!,...,«}) where VI <i <n:#i = 0. Leaves labelled by / G N are called root references. 
The forest f i • • • ?« represents the graph (8)f i • • • f„ obtained by uniting the trees of 
f 1 • • • f„, assuming w.l.o.g. that their sets of nodes are disjoint, and interconnecting their 
roots with the corresponding root references. Formally, ®f i • • • f„ contains an edge v i-^ 
(a, vi • • • v,„) iff there is an edge v h-> (a,v[ ■ ■ ■ v'^) of some tree f,-, 1 < / < n, s.t. for all 
i < j <tn, Vj — root{tic) if v'- is a root reference with i{v'A — k, and Vj — v'- otherwise. 

Tree automata. A (finite, non-deterministic, top-down) tree automaton (TA) is a quadru- 
ple A = (g,E, A,/?) where 2 is a finite set of states, /? C g is a set of root states, E is 
a ranked alphabet, and A is a set of transition rules. Each transition rule is a triple of 
the form {q,a,qi . . .q„) where n > 0, q,qi,. ■ ■ ,qn E Q, a eL, and #a = n. In the special 
case where n = 0, we speak about the so-called leaf rules. 

A run of A over a tree f over E is a mapping p : dom{t) -^ Q s.t. for each node 

V G dom{t) where q — p(v), if qt — p(5(v),) for 1 < / < |5(v)|, then A has a rule q -^ 
£{v){qi . . .(?|s(y)|). We writer ==^p ^ to denote that p is a run of A overf s.t. p{root{t}) = 
q. We use f => q to denote that t ==^p q for some run p. The language of a state q is 
defined by L(q) = {f | f => q}, and the language of A is defined by L{A) — Uog«^(?)- 

Graphs and forests with ports. We will further work with graphs with designated input 
and output points. An io-graph is a pair (g,(|)), abbreviated as g^, where g is a graph 
and (|) G dom{g)^ a sequence of ports in which (|)i is the input port and (|)2 •••(|)|^| is 
a sequence of output ports such that the occurrence of ports in (|) is unique. Ports and 
joins of ^ are called cut-points of gt^. We use cps{gif) to denote all cut-points of gtf. We 
say that gif is accessible if it is accessible from the input port isfy. 

An io-forest is a pair/ = (fi ■••f„,7t) s.t. n>\ andn G {!,... ,«}^ is a sequence of 
port indices, Jti is the input index, and %2 ■ --T^M is a sequence of output indices, with 
no repetitions of indices in %. An io-forest encodes the io-graph ®f where the ports of 
®t\ ■■■t„ are roots of the trees defined by 7t, i.e., €5/= ((8)fi ■ ■ ■tn,root{t-^^) ■ ■ ■ root{tTiJ). 



Forest automata. A forest automaton (FA) over E is a pair F = (Ai ■■■A„,k) where 
n> l,Ai- ■ -Anis a sequence of tree automata over EU {!,...,«}, and 7IG {!,...,«}+ 
is a sequence of port indices as defined for io-forests. The forest language of F is the 
set of io-forestsL/(F) =L(Ai) x ••• xL{A„) x {n}, andthe graph language of F is the 

set of io-graphs L{F) = {(g) f \ f e Lf{F)}. 

Structured labels. We will further work with alphabets where symbols, called structured 
labels, have an inner structure. Let F be a ranked alphabet of sub-labels, ordered by a to- 
tal ordering Er- We will work with graphs over the alphabet 2^ where for every symbol 
A C F, #A = LaGA#'2- Lete = v^^ ({oi,- • • ,flm},v'i • • • v„) be an edge of a graph § where 
n — Y.KKm'^^i andfli IZrfl2 l^r • • • Crflm- The triple e(/) = v ^ (a;, Vjt • • • v/), I <i <m, 
from the sequence e(l) = v ^ (fli,vi • • • v#„j),... ,e(m) = v ^ (am,v„-#a„+i •••v„) is 
called the /-th sub-edge of e (or the i-th sub-edge of v in g). We use SE{g) to denote 
the set of all sub-edges of g. We say that a node v of a graph is isolated if it does not 
appear within any sub-edge, neither as an origin (i.e., £{v) ~ 0) nor as a target. A graph 
g without isolated nodes is unambiguously determined by SE{g) and vice versa (due to 
the total ordering Er and since g has no isolated nodes). We further restrict ourselves 
to graphs with structured labels and without isolated nodes. 

A counterpart of the notion of sub-edges in the context of rules of TA is the notion of 
rule-terms, defined as follows: Given a rule 5 — {q, {fli, . . . ,flm},^i • • '^n) of a TA over 
structured labels of 2"", rule-terms of 5 are the terms 5(1) — ai{qi ■■■q#ai),- ■ ■ ,5(ot) = 
am{qn-#am+i ' ' ■ qn) where 5(/), 1 < / < m, is called the i-th rule-term of 8. 
Forest automata of a higher level. We let Fi be the set of all forest automata over 2^ and 
call its elements forest automata over F of level 1. For / > 1, we define F, as the set of 
all forest automata over ranked alphabets 2^^^^^ where A C F,_ i is any nonempty finite 
set of FA of level / — 1 . We denote elements of F, as forest automata over F of level 
i. The rank #F of an FA F in these alphabets is the number of its output port indices. 
When used in an FA F over 2"^^^^, the forest automata from A are called boxes of F. We 
write F* to denote U,>oF, and assume that F* is ordered by some total ordering iZr, • 

An FA F of a higher level over F accepts graphs where forest automata of lower lev- 
els appear as sub-labels. To define the semantics of F as a set of graphs over F, we need 
the following operation of sub-edge replacement where a sub-edge of a graph is substi- 
tuted by another graph. Intuitively, the sub-edge is removed, and its origin and targets 
are identified with the input and output ports of the substituted graph, respectively. 

Formally, let g be a graph with an edge e E g and its i-th sub-edge e{i) = vi — )• 
{a,V2---v„), I < i < \Sg{vi)\. Let ^i be an io-graph with |(|)| = n. Assume w.l.o.g. that 
dom{g) D dom{g') — 0. The sub-edge e{i) can be replaced by g' provided that VI < y < 
n : £g{vj) ri£gi{(^j) = 0, which means that the node Vj G dom{g) and the corresponding 
port ^j G dom{g') do not have successors reachable over the same symbol. If the re- 
placement can be done, the result, denoted g[g'fy/e{i)\, is the graph gn in the sequence 
go,... ,g„ of graphs defined as follows: SE{go) — SE{g) \JSE{g') \ {e{i)}, and for each 
j : I < j < n, the graph gj arises from gj^i by (1) deriving a graph h by replacing the 
origin of the sub-edges of the j-th port (|)j of g' by Vj, (2) redirecting edges leading to 
(^j to Vj, i.e., replacing all occurrences of (^j in img{h) by Vj, and (3) removing (|)/. 

If the symbol a above is an FA and gi G L{a), we say that h = g[g'A/e{i)] is an 
unfolding ofg, written g ~< h. Conversely, we say that g arises from h hy folding g'^ into 



e{i). Let -<* be the reflexive transitive closure of -^. The Y-semantics of g is then the set 
of graphs g' over Y s.t. g -<* g' , denoted [g]r, or just \g\ if no confusion may arise. For 
an FA F of a higher level over Y, we let \F\ = [J„^^l(f\ ( {g} x {(])}). 

Canonicity. We call an io-forest / = (f i ■ ■ ■ f„ , 7i) minimal iff the roots of the trees t\---tn 
are the cut-points of ®f. A minimal forest representation of a graph is unique up to 
reordering of t\---t„. Let the canonical ordering of cut-points of 0/ be defined by the 
cost of the cheapest paths leading from the input port to them. We say that / is canon- 
ical iff it is minimal, ®f is accessible, and the trees within t\---t„ are ordered by the 
canonical ordering of their roots (which are cut-points of (g)/). A canonical forest is thus 
a unique representation of an accessible io-graph. We say that an FA respects canon- 
icity iff all forests from its forest language are canonical. Respecting canonicity makes 
it possible to efficiently test FA language inclusion by testing TA language inclusion 
of the respective components of two FA. This method is precise for FA of level 1 and 
sound (not always complete) for FA of a higher level [6J. 

In practice, we keep automata in the so called state uniform form, which simplifies 
maintaining of the canonicity respecting form f6l (and it is also useful when abstracting 
and "folding", as discussed in the following). It is defined as follows. Given a node v of 
a tree t in an io-forest, we define its span as the pair (oc, V ) where a e N* is the sequence 
of labels of root references reachable from the root of t ordered according to the prices 
of the cheapest paths to them, and V C N is the set of labels of references which occur 
more than once in t. The state uniform form then requires that all nodes of forests from 
L{F) that are labelled by the same state q in some accepting run of F have the same 
span, which we denote by span{q). 

3 FA-based Shape Analysis 

We now provide a high-level overview of the main loop of our shape analysis. The 
analysis automatically discovers memory safety errors (such as invalid dereferences 
of null or undefined pointers, double frees, or memory leaks) and provides an FA- 
represented over-approximation of the sets of heap configurations reachable at each 
program line. We consider sequential non-recursive C programs manipulating the heap. 
Each heap cell may have several pointer selectors and data selectors from some finite 
data domain (below, PSel denotes the set of pointer selectors, DSel denotes the set of 
data selectors, and D denotes the data domain). 

Heap representation. A single heap configuration is encoded as an io-graph g^t over the 
ranked alphabet of structured labels iF with sub-labels from the ranked alphabet F = 
PSelU {DSel x D) with the ranking function that assigns each pointer selector 1 and each 
data selector 0. In this graph, an allocated memory cell is represented by a node v, and 
its internal structure of selectors is given by a label ig{v) G 2^. Values of data selectors 
are stored directly in the structured label of a node as sub-labels from DSel x D, so, 
e.g., a singly linked list cell with the data value 42 and the successor node Xn^.^ may 
be represented by a node x such that £g{x) = {next(x„ga), (data,42)(8))}. Selectors 
with undefined values are represented such that the corresponding sub-labels are not in 
ig{x). The null value is modelled as the special node null such that £^(null) — 0. The 



input port sf represents a special node that contains the stack frame of the analysed 
function, i.e. a structure where selectors correspond to variables of the function. 

In order to represent (infinite) sets of heap configurations, we use state uniform FA 
of a higher level to represent sets of canonical io-forests representing the heap configu- 
rations. The FA used as boxes are learnt during the analysis using the learning algorithm 
presented in SecH) 

Symbolic Execution. The verification procedure performs standard abstract interpreta- 
tion with the abstract domain consisting of sets of state uniform FA (a single FA does 
not suffice as FA are not closed under union) representing sets of heap configurations 
at particular program locations. The computation starts from the initial heap configura- 
tion given by an FA for the io-graph gst where g comprises two nodes: null and sf 
where ^^(sf ) = 0. The computation then executes abstract transformers corresponding 
to program statements until the sets of FA held at program locations stabilise. We note 
that abstract transformers corresponding to pointer manipulating statements are exact. 
Executing the abstract transformer Top over a set of FA S is performed separately for 
every F eS- Some of boxes are first unfolded to uncover the accessed part of the heaps, 
then the update is performed. The detailed description of these steps can be found in f7| . 
At junctions of program paths, the analysis computes unions of sets of FA. At loop 
points, the union is followed by widening. The widening is performed by applying box 
folding and abstraction repeatedly in a loop on each FA from S until the result stabilises. 
An elaboration of these two operations, described in detail in Sec.|4]and|5]respectively, 
belongs to the main contribution of the presented paper 



4 Learning of Boxes 

Sets of graphs with an unbounded number of joins can only be described by FA with the 
help of boxes. In particular, boxes allow one to replace (multiple) incoming sub-edges 
of a join by a single sub-edge, and hence lower the in-degree of the join. Decreasing the 
in-degree to 1 turns the join into an ordinary node. When a box is then used in a cycle 
of an FA, it effectively generates an unbounded number of joins. 

The boxes are introduced by the operation of folding of an FA F which transforms 
F into an FA F' and a box B used in F' such that |F] = \F'\. However, the graphs 
in L{F') may contain less joins since some of them are hidden in the box B, which 
encodes a set of subgraphs containing a join and appearing repeatedly in the graphs of 
L{F). Before we explain folding, we give a characterisation of subgraphs of graphs of 
L[F) which we want to fold into a box B. Our choice of the subgraphs to be folded 
is a compromise between two high-level requirements. On the one hand, the folded 
subgraphs should contain incoming edges of joins and be as simple as possible in order 
to be reusable. On the other hand, the subgraphs should not be too small in order not 
to have to be subsequently folded within other boxes (in the worst case, leading to 
generation of unboundedly nested boxes). Ideally, the hierarchical structuring of boxes 
should respect the natural hierarchical structuring of the data structures being handled 
since if this is not the case, unboundedly many boxes may again be needed. 




4.1 Knots of Graphs 

A graph h is a subgraph of a graph g iff SE{h) C SE{g) . The border of /i in g is the subset 
of the set dom{h) of nodes of /z that are incident with sub-edges in SE{g)\SE{h). A trace 
from a node m to a node v in a graph ^ is a set of sub-edges t = {eo, • • • ,e„} C SE{g) 
such that n > 1, eo is an outgoing sub-edge of m, e„ is an incoming sub-edge of v, the 
origin of e, is one of the targets of e,_i for all \ <i <n, and no two sub-edges have the 
same origin. We call the origins of ei , . . . , e„ the inner nodes of the trace. A trace from 
M to V is straight iff none of its inner nodes is a cut-point. A cycle is a trace from a node 
V to V. A confluence of gi/ is either a cycle of g/s/ or it is the union of two disjoint traces 
starting at a node m, called the base, and ending in the node v, called the tip (for a cycle, 
the base and the tip coincide). 

Given an io-graph gif, the signature of a sub-graph h of g is the minimum subset 
sig{h) of cps{gi^) that (1) contains cps{gi^) r\dom{h) and (2) all nodes of h, except 
the nodes of sig{h) themselves, are reachable by straight traces x 
from sig{h). Intuitively, sig{h) contains all cut-points of h plus 
the closest cut-points to h which lie outside of h but which are 
needed so that all nodes of h are reachable from the signature. 
Consider the example of the graph g„ in Fig. |2] in which cut- 
points are represented by •. The signature of gu is the set {m, v}. 
The signature of the highlighted subgraph h is also equal to Fig. 2: Closure. 
{m,v}. Given a set L'^ C cps{gisf), a confluence ofU is a confluence of g^ with the signa- 
ture within U. Intuitively, the confluence of a set of cut-points [/ is a confluence whose 
cut-points belong to U plus in case the base is not a cut-point, then the closest cut-point 
from which the base is reachable is also from U. Finally, the closure of U is the smallest 
subgraph h of g,^ that (1) contains all confluences of U and (2) for every inner node v of 
a straight trace of h, it contains all straight traces from v to leaves of g. The closure of 
the signature {u,v} of the graph gu in Fig.|2]is the highlighted subgraph h. Intuitively, 
Point 1 includes into the closure all nodes and sub-edges that appear on straight traces 
between nodes of U apart from those that do not lie on any confluence (such as node u 
in Fig.|2|l. Note that nodes x and y in Fig.|2] which are leaves of g^, are not in the closure 
as they are not reachable from an inner node of any straight trace of h. The closure of 
a subgraph h of gif is the closure of its signature, and h is closed iff it equals its closure. 

Knots. For the rest of Sec. 14. 11 let us fix an io-graph ^,j, S L{F). We now introduce the 
notion of a knot which summarises the desired properties of a subgraph kof g that is to 
be folded into a box. A knot k of g^ is a subgraph of g such that: (1) fc is a confluence, 
(2) k is the union of two knots with intersecting sets of sub-edges, or (3) k is the closure 
of a knot. A decomposition of a knot A: is a set of knots such that the union of their 
sub-edges equals SE{k). The complexity of a decomposition of k is the maximum of 
sizes of signatures of its elements. We define the complexity of a knot as the minimum 
of the complexities of its decompositions. A knot k of complexity n is an optimal knot 
of complexity n if it is maximal among knots of complexity n and if it has a root. The 
root must be reachable from the input port of gtsf by a trace that does not intersect with 
sub-edges of the optimal knot. Notice that the requirement of maximality implies that 
optimal knots are closed. 



The following lemma, proven in Q, implies that optimal knots are uniquely identi- 
fied by their signatures, which is crucial for the folding algorithm presented later. 

Lemma 1. The signature of an optimal knot ofg^ equals the signature of its closure. 

Next, we explain what is the motivation behind the notion of an optimal knot: 

Confluences. As mentioned above, in order to allow one to eliminate a join, a knot 
must contain some join v together with at least one incoming sub-edge in case the knot 
is based on a loop and at least two sub-edges otherwise. Since gUf is accessible (meaning 
that there do not exist any traces that cannot be extended to start from the same node), 
the edge must belong to some confluence c of gtsf. If the folding operation does not fold 
the entire c, then a new join is created on the border of the introduced box: one of its 
incoming sub-edges is labelled by the box that replaces the folded knot, another one is 
the last edge of one of the traces of c. Confluences are therefore the smallest subgraphs 
that can be folded in a meaningful way. 

Uniting knots. If two different confluences c and c' share an /- ^ ^^ n 

edge, then after folding c, the resulting edge shares with c' two |[ (o^^»c^ ^^^^o4-^ 
nodes (at least one being a target node), and thus c' contains a join 
of g(j,. To eliminate this join too, both confluences must be folded Fig- 3: A list with 
together. A similar reasoning may be repeated with knots in gen- head pointers, 
eral. Usefulness of this rule may be illustrated by an example of the set of lists with 
head pointers. Without uniting, every list would generate a hierarchy of knots of the 
same depth as the length of the list, as illustrated in Fig. |3] This is clearly impractical 
since the entire set could not be represented using finitely many boxes. Rule |2] unites 
all knots into one that contains the entire list, and the set of all such knots can then be 
represented by a single FA (containing a loop accepting the inner nodes of the lists). 

Complexity of knots. The notion of complexity is introduced to limit the effect of 
Rule|2]of the definition of a knot, which unites knots that share a sub-edge, and to hope- 
fully make it follow the natural hierarchical structuring of data structures. Consider, for 
instance, the case of singly-linked lists (SLLs) of cyclic doubly-linked lists (DLLs). In 
this case, it is natural to first fold the particular segments of the DLLs (denoted as DLSs 
below), i.e., to introduce a box for a single pair of next and prev pointers. This way, one 
effectively obtains SLLs of cyclic SLLs. Subsequently, one can fold the cyclic SLLs 
into a higher-level box. However, uniting all knots with a common sub-edge would cre- 
ate knots that contain entire cyclic DLLs (requiring unboundedly many joins inside the 
box). The reason is that in addition to the confluences corresponding to DLSs, there 
are confluences which traverse the entire cyclic DLLs and that share sub-edges with all 
DLSs (this is in particular the case of the two circular sequences consisting solely of 
next and prev pointers respectively). To avoid the undesirable folding, we exploit the 
notion of complexity and fold graphs in successive rounds. In each round we fold all 
optimal knots with the smallest complexity (as described in Sec. 14.2b . which should 
correspond to the currently most nested, not yet folded, sub-structures. In the previous 
example, the algorithm starts by folding DLSs of complexity 2, because the complexity 
of the confluences in cyclic DLLs is given by the number of the DLSs they traverse. 

Closure of knots. The closure is introduced for practical reasons. It allows one to 
identify optimal knots by their signatures, which is then used to simplify automata 
constructions that implement folding on the level of FA (cf. Sec. l4.2l i. 



o;:^^^2 



Root of an optimal knot. The requirement for an optimal knot k to have a root is to 
guarantee that if an io-graph /ly containing a box B representing k is accessible, then the 
io-graph h^ [k/B] emerging by substituting k for a sub-edge labelled with B is accessible, 
and vice versa. It is also a necessary condition for the existence of a canonical forest 
representation of the knot itself (since one needs to order the cut-points w.rt. the prices 
of the paths leading to them from the input port of the knot). 

4.2 Folding in the Abstraction Loop 

In this section, we describe the operation of folding to- j Unfold solitaire boxes 

gether with the main abstraction loop of which folding 2 repeat 

is an integral part. The pseudo-code of the main abstrac- 3 Normalise 

tion loop is shown in Alg.[T] The algorithm modifies a 4 Abstract 

set of FA until it reaches a fixpoint. Folding on line |5]is 5 PqI^ 

a sub-procedure of the algorithm which looks for sub- g untU fixnoint 

structures of FA that accept optimal knots, and replaces » , , » , • t 

, , , Ale. 1: Abstraction Loop 

these substructures by boxes that represent the corre- 
sponding optimal knots. The operation of folding is itself composed of four consecutive 
steps: Identifying indices. Splitting, Constructing boxes, und Applying boxes. For space 
reasons, we give only an overview of the steps of the main abstraction loop and folding. 
Details may be found in Q . 

Unfolding of solitaire boxes. Folding is in practice applied on FA 
that accept partially folded graphs (only some of the optimal knots 
are folded). This may lead the algorithm to hierarchically fold data Fig. 4: DLL. 

structures that are not hierarchical, causing the symbolic execution not to terminate. For 
example, consider a program that creates a DLL of an arbitrary length. Whenever a new 
DLS is attached, the folding algorithm would enclose it into a box together with the tail 
which was folded previously. This would lead to creation of a hierarchical structure of 
an unbounded depth (see Fig. |4|i, which would cause the symbolic execution to never 
reach a fixpoint. Intuitively, this is a situation when a repetition of subgraphs may be 
expressed by an automaton loop that iterates a box, but it is instead misinterpreted as 
a recursive nesting of graphs. This situation may happen when a newly created box 
contains another box that cannot be iterated since it does not appear on a loop (e.g, in 
Fig. |4]there is always one occurrence of a box encoding a shorter DLL fragment inside 
a higher-level box). This issue is addressed in the presented algorithm by first unfolding 
all occurrences of boxes that are not iterated by automata loops before folding is started. 

Normalising. We define the index of a cut-point u e cps{gi^) as its position in the canon- 
ical ordering of cut-points of g,^, and the index of a closed subgraph h of g^ as the set of 
indices of the cut-points in sig{h). The folding algorithm expects the input FA F to sat- 
isfy the property that all io-graphs of L{F) have the same indices of closed knots. The 
reason is that folding starts by identifying the index of an optimal knot of an arbitrary 
io-graph from L{F), and then it creates a box which accepts all closed subgraphs of the 
io-graphs from gi/ with the same index. We need a guarantee that all these subgraphs 
are indeed optimal knots. This guarantee can be achieved if the io-graphs from L{F) 
have equivalent interconnections of cut-points, as defined below. 
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We define the relation ^g^ C N x N between indices of closed knots of g^ such that 
A^ ^& N' iff there is a closed knot k of g^ with the index A^ and a closed knot k' with 
the index A^' such that k and k' have intersecting sets of sub-edges. We say that two 
io-graphs g^^ and /ly are interconnection equivalent iff '^g^ — ^/j^. 

Lemma 2. Interconnection equivalent io-graphs have the same indices of optimal knots. 

Interconnection equivalence of all io-graphs in the language of an FA F is achieved 
by transforming F to the interconnection respecting form. This form requires that the 
language of every TA of the FA consists of interconnection equivalent trees (when view- 
ing root references and roots as cut-points with corresponding indices). The transfor- 
mation is described in |7|. The normalisation step also includes a transformation into 
the state uniform and canonicity respecting form. 

Abstraction. We use abstraction described in Sec. |5] that preserves the canonicity re- 
specting form of TA as well as their state uniformity. It may break interconnection 
uniformity, in which case it is followed by another round of normalisation. Abstraction 
is included into each round of folding for the reason that it leads to learning more gen- 
eral boxes. For instance, an FA encoding a cyclic list of one particular length is first 
abstracted into an FA encoding a set of cyclic lists of all lengths, and the entire set is 
then folded into a single box. 

Identifying indices. For every FA F entering this sub-procedure, we pick an arbitrary 
io-graph g^ e L{F), find all its optimal knots of the smallest possible complexity n, and 
extract their indices. By Lemma |2] and since F is normalised, indices of the optimal 
knots are the same for all io-graphs in L{F). For every found index, the following steps 
fold all optimal knots with that index at once. Optimal knots of complexity n do not 
share sub-edges, the order in which they are folded is therefore not important. 

Splitting. For an FA F — {A\---A„,ii) and an index / of an optimal knot found in the 
previous step, splitting transforms F into a (set of) new FA with the same language. The 
nodes of the borders of /-indexed optimal knots of io-graphs from L{F) become roots 
of trees of io-forests accepted by the new FA. Let 5 G / be a position in F such that the 
s-indexed cut-points of io-graphs from L{F) reach all the other /-indexed cut-points. 
The index s exists since an optimal knot has a root. Due to the definition of the closure, 
the border contains all /-indexed cut-points, with the possible exception of s. The s-\h 
cut-point may be replaced in the border of the /-indexed optimal knot by the base e of 
the /-indexed confluence that is the first one reached from the s-i\\ cut-point by a straight 
path. We call e the entry. The entry e is a root of the optimal knot, and the s-i\\ cut-point 
is the only /-indexed cut-point that might be outside the knot. If e is indeed different 
from the 5-th cut-point, then the .s-th tree of forests accepted by F must be split into two 
trees in the new FA: The subtree rooted at the entry is replaced by a reference to a new 
tree. The new tree then equals the subtree of the original .s-th tree rooted at the entry. 

The construction is carried out as follows. We find all states and all of their rules that 
accept entry nodes. We denote such states and rules as entry states and rules. For every 
entry state q, we create a new FA F^ which is a copy of F but with the i-th TA A^ split 
to a new s-i\\ TA A[ and a new [n + l)-th TA A„+i. The TA A[ is obtained from A^ by 
changing the entry rules of q to accept just a reference to the new (« + 1 )-th root and by 
removing entry rules of all other entry states (the entry states are processed separately in 
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Fig. 5: Creation of Fg and Bq from F^^ . The subtrees that contain references /, j ^ J are 
taken into Bg, and replaced by the B^-labelled sub-edge in Fq. 

order to preserve possibly different contexts of entry nodes accepted at different states). 
The new TA A„+i is a copy of Ai but with the only accepting state being q. Note that the 
construction is justified since due to state uniformity, each node that is accepted by an 
entry rule and that does not appear below a node that is also accepted by an entry rule 
is an entry node. In the result, the set J — {I\ {s}) U {« + 1 } contains the positions of 
the trees of forests of F^ rooted at the nodes of the borders of /-indexed optimal knots. 

Constructing boxes. For every Fg and J being the result of splitting F according to 
an index /, a box Bg is constructed from Fg. We transform TA of F^ indexed by the 
elements of /. The resulting TA will accept the original trees up to that the roots are 
stripped from the children that cannot reach a reference to J. To turn these TA into an 
FA accepting optimal knots with the index /, it remains to order the obtained TA and 
define port indices, which is described in detail in Q- Roughly, the input index of the 
box will be the position j to which we place the modified (n + l)-th TA of F^ (the one 
that accepts trees rooted at the entry). The output indices are the positions of the TA 
with indices J\ {j} in F^ which accept trees rooted at cut-points of the border of the 
optimal knots. 

Applying boxes. This is the last step of folding. For every F^, J, and Bg which are the 
result of splitting F according to an index /, we construct an FA Fg that accepts graphs 
of F where knots enclosed in Bg are substituted by a sub-edge with the label Bg. It 
is created from Fg by ( 1 ) leaving out the parts of root rules of its TA that were taken 
into Bg, and (2) adding the rule-term Bg{ri,... ,r„,)to the rule-terms of root rules of the 
(n + 1 )-th component of F^ (these are rules used to accept the roots of the optimal knots 
enclosed in Bg). The states ri , . . . , r,„ are fresh states that accept root references to the 
appropriate elements of J (to connect the borders of knots of Bg correctly to the graphs 
of Fq — the details may be found in |7|). The FA Fg now accepts graphs where optimal 
knots of graphs of L{F) with the signature / are hidden inside Bg. Creation of Bg and of 
its counterpart Fg from F*' is illustrated in Fig.|5]where /, j, ... £j. 

During the analysis, the discovered boxes must be stored in a database and tested for 
equivalence with the newly discovered ones since the alphabets of FA would otherwise 
grow with every operation of folding ad infinitum. That is, every discovered box is given 
a unique name, and whenever a semantically equivalent box is folded, the newly created 
edge-term is labelled by that name. This step offers an opportunity for introducing an- 
other form of acceleration of the symbolic computation. Namely, when a box B is found 
by the procedure described above, and another box B' with a name A^ s.t. |B']] C [Bj is 
already in the database, we associate the name A^ with B instead of with B' and restart the 
analysis (i.e., start the analysis from the scratch, remembering just the updated database 
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of boxes). If, on the other hand, JB] C |B'], the folding is performed using the name A^ 
of B' , thus overapproximating the semantics of the folded FA. As presented in Sec. |6l 
this variant of the procedure, called folding by inclusion, performs in some difficult 
cases significantly better than the former variant, called /o/ii/ng by equivalence. 

5 Abstraction 

The abstraction we use in our analysis is based on the general techniques described in 
the framework of abstract regular (tree) model checking ||2l. We, in particular, build on 
the finite height abstraction of TA. It is parameterised by a height A: G N, and it collapses 
TA states q, q' iff they accept trees with the same sets of prefixes of the height at most k 
(the prefix of height A: of a tree is a subgraph of the tree which contains all paths from 
the root of length at most k). This defines an equivalence on states denoted by «<.. The 
equivalence w^. is further refined to deal with various features special for FA. Namely, 
it has to work over tuples of TA and cope with the interconnection of the TA via root 
references, with the hierarchical structuring, and with the fact that we use a set of FA 
instead of a single FA to represent the abstract context at a particular program location. 

Refinements of «<;. First, in order to maintain the same basic shape of the heap after 
abstraction (such that no cut-point would, e.g., suddenly appear or disappear), we re- 
fine ^li by requiring that equivalent states must have the same spans (as defined in 
Sec.|2]l. When applied on «i, which corresponds to equivalence of data types, this re- 
finement provided enough precision for most of the case studies presented later on, with 
the exception of the most difficult ones, namely programs with skip lists 1 13,1. To ver- 
ify these programs, we needed to further refine the abstraction to distinguish automata 
states whenever trees from their languages encode tree components containing a differ- 
ent number of unique paths to some root reference, but some of these paths are hidden 
inside boxes. In particular, two states q,q^ can be equivalent only if for every io-graph 
g,j, from the graph language of the FA, for every two nodes m, v G dom{gi/) accepted by 
q and q' , respectively, in an accepting run of the corresponding TA, the following holds: 
For every w S cps{gii), both u and v have the same number of outgoing sub-edges (se- 
lectors) in \gi^\ which start a trace in \gi^\ leading to w. According to our experiments, 
this refinement does not cost almost any performance, and hence we use it by default. 

Abstraction for Sets of FA. Our analysis works with sets of FA. We observed that ab- 
stracting individual FA from a set of FA in isolation is sometimes slow since in each 
of the FA, the abstraction widens some selector paths only, and it takes a while until 
an FA in which all possible selector paths are widened is obtained. For instance, when 
analysing a program that creates binary trees, before reaching a fixpoint, the symbolic 
analysis generates many FA, each of them accepting a subset of binary trees with some 
of the branches restricted to a bounded length (e.g., trees with no right branches, trees 
with a single right branch of length 1, length 2, etc.). In such cases, it helps when the 
abstraction has an opportunity to combine information from several FA. For instance, 
consider an FA that encodes binary trees degenerated to an arbitrarily long left branch, 
and another FA that encodes trees degenerated to right branches only. Abstracting these 
FA in isolation has no effect. However, if the abstraction is allowed to collapse states 
from both of these FA, it can generate an FA accepting all possible branches. 
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Unfortunately, the natural solution to achieve the above, which is to unite FA before 
abstraction, cannot be used since FA are not closed under union (uniting TA component- 
wise overapproximates the union). However, it is possible to enrich the automata struc- 
ture of an FA F by TA states and rules of another one without changing the language of 
F, and in this way allow the abstraction to combine the information from both FA. In 
particular, before abstracting an FA F ~{Ai---A„,%) from a set S of FA, we pre-process 
it as follows. (1) We pick automata F' = {A\ ■ ■ -A'^jK) G S which are compatible with 
F in that they have the same number of TA, the same port references, and for each 
!</<«, the root states of A; have the same spans as the root states of A,. (2) For all 
such F' and each 1 < / < n, we add rules and states of Aj to A,-, but we keep the original 
set of root states of A,. Since we assume that the sets of state of TAs of different FA are 
disjoint, the language of A, stays the same, but its structure is enriched, which helps the 
abstraction to perform a coarser widening. 

6 Experimental Results 

We have implemented the above proposed techniques in the Forester tool and tested 
their generality and efficiency on a number of case studies. In the experiments, we 
compare two configurations of Forester, and we also compare the results of Forester 
with those of Predator |4|, which uses a graph-based memory representation inspired 
by separation logic with higher-order list predicates. We do not provide a comparison 
with Space Invader [12] and SLAyer yj, based also on separation logic with higher- 
order list predicates, since in our experiments they were outperformed by Predator 

In the experiments, we considered programs with various types of lists (singly and 
doubly linked, cyclic, nested, with skip pointers), trees, and their combinations. In the 
case of skip lists, we had to slightly modify the algorithms since their original versions 
use an ordering on the data stored in the nodes of the lists (which we currently do 
not support) in order to guarantee that the search window delimited on some level of 
skip pointers is not left on any lower level of the skip pointers. In our modification, 
we added an additional explicit end-of-window pointer. We checked the programs for 
memory safety only, i.e., we did not check data-dependent properties. 

Table[T]gives running times in seconds (the average of 10 executions) of the tools on 
our case studies. "Basic" stands for Forester with the abstraction applied on individual 
FA only and "SFA" stands for Forester with the abstraction for sets of FA. The value T 
means that the running time of the tool exceeded 30 minutes, and the value Err means 
that the tool reported a spurious error. The names of the examples in the table contain the 
name of the data structure manipulated in the program, which is "SLL" for singly linked 
lists, "DLL" for doubly linked lists (the "C" prefix denotes cyclic lists), "tree" for binary 
trees, "tree-nparents" for trees with parent pointers. Nested variants of SLL (DLL) are 
named as "SLL (DLL) of" and the type of the nested structure. In particular, "SLL of 
0/1 SLLs" stands for SLL of a nested SLL of length or 1, and "SLL of 2CDLLs" 
stands for SLL whose each node is a root of two CDLLs. The "H-head" flag stands 
for a list where each element points to the head of the list and the subscript "Linux" 
denotes the implementation of lists used in the Linux kernel, which uses type casts and 
a restricted pointer arithmetic. The "DLL-nsubdata" stands for a kind of a DLL with data 



14 



Table 1 : Results of the experiments 



Example 


basic 


SFA 


boxes 


Predator 


Example 


basic 


SFA 


boxes 


Predator 


SLL (delete) 


0.03 


0.04 




0.04 


DLL (reverse) 


0.04 


0.06 


1 / 1 


0.03 


SLL (bubblesort) 


0.04 


0.04 




0.03 


DLL (insert) 


0.06 


0.07 


1 / 1 


0.05 


SLL (mergesort) 


0.08 


0.15 




0.10 


DLL (insertsort 1) 


0.35 


0.40 


1 / 1 


0.11 


SLL (insertsort) 


0.05 


0.05 




0.04 


DLL (insertsort2) 


0.11 


0.12 


1 / 1 


0.05 


SLL (reverse) 


0.03 


0.03 




0.03 


DLLofCDLLs 


5.67 


1.25 


8/7 


0.22 


SLL+head 


0.05 


0.05 




0.03 


DLL+subdata 


0.06 


0.09 


-12 


T 


SLL of 0/1 SLLs 


0.03 
0.03 


0.03 
0.03 




0.11 
0.03 


CDLL 


0.03 


0.03 


1 / 1 


0.03 


tree 


0.14 


0.14 




Err 


SLLofCSLLs 


2.07 


0.73 


3/4 


0.12 


tree+parents 


0.18 


0.21 


2/2 


T 


SLL of 2CDLLsLinux 


0.16 


0.17 


13/5 


0.25 


tree+stack 
tree (DSW) 


0.09 

1.74 


0.08 
0.40 




Err 
Err 


skip list2 


0.66 


0.42 


-/3 


T 


skip list3 


T 


9.14 


-11 


T 


tree of CSLLs 


0.32 


0.42 


-/4 


Err 



pointers pointing either inside the list nodes or optionally outside of them. For a "skip 
list", the subscript denotes the number of skip pointers. In the example "tree+stack", a 
randomly constructed tree is deleted using a stack, and "DSW" stands for the Deutsch- 
Schorr-Waite tree traversal (the Lindstrom variant). All experiments start with a random 
creation and end with a disposal of the specified structure while the indicated procedure 
(if any) is performed in between. The experiments were run on a machine with the Intel 
17-2600 (3.40 GHz) CPU and 16GiB of RAM. 

The table further contains the column "boxes" where the value "X/Y" means that 
X manually created boxes were provided to the analysis that did not use learning while 
Y boxes were learnt when the box learning procedure was enabled. The value "-" of 
X means that we did not run the given example with manually constructed boxes since 
their construction was too tedious. If user-defined boxes are given to Forester in ad- 
vance, the speedup is in most cases negligible, with the exception of "DLL of CDLLs" 
and "SLL of CSLLs", where it is up to 7 times. In a majority of cases, the learnt boxes 
were the same as the ones created manually. However, in some cases, such as "SLL of 
2CDLLsLinux ", the learning algorithm found a smaller set of more elaborate boxes than 
those provided manually. 

In the experiments, we use folding by inclusion as defined in Sec. 14.21 For simpler 
cases, the performance matched the performance of folding by equivalence, but for the 
more difficult examples it was considerably faster (such as for "skip list2" when the 
time decreased from 3.82 s to 0.66 s), and only when it was used the analysis of "skip 
lists" succeeded. Further, the implementation folds optimal knots of the complexity 
< 2 which is enough for the considered examples. Finally, note that the performance of 
Forester in the considered experiments is indeed comparable with that of Predator even 
though Forester can handle much more general data structures. 



7 Conclusion 

We have proposed a new shape analysis using forest automata which — unlike the pre- 
viously known approach based on FA — is fully automated. For that purpose, we have 
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proposed a technique of automatically learning FA called boxes to be used as alphabet 
symbols in higher-level FA when describing sets of complex heap graphs. We have also 
proposed a way how to efficiently integrate the learning with the main analysis algo- 
rithm. Finally, we have proposed a significant improvement — both in terms of general- 
ity as well as efficiency — of the abstraction used in the framework. An implementation 
of the approach in the Forester tool allowed us to fully-automatically handle programs 
over quite complex heap structures, including 2-level and 3-level skip lists, which — to 
the best of our knowledge — no other fully-automated verification tool can handle. At 
the same time, the efficiency of the analysis is comparable with other state-of-the-art 
analysers even though they handle less general classes of heap structures. 

For the future, there are many possible ways how the presented approach can be 
further extended. First, one can think of using recursive boxes or forest automata using 
hedge automata as their components in order to handle even more complex data struc- 
tures (such as mcf trees). Another interesting direction is that of integrating FA-based 
heap analysis with some analyses for dealing with infinite non-pointer data domains 
(e.g., integers) or parallelism. 
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